Development and Evaluation of a Korean Treebank and its Application to NLP

نویسندگان

  • Chung-hye Han
  • Na-Rare Han
  • Eon-Suk Ko
  • Martha Palmer
چکیده

This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control are presented. The evaluation on the quality of the Treebank and some of the NLP applications under development using the Treebank are also presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور

The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...

متن کامل

Penn Korean Treebank : Development and Evaluation

With growing interest in Korean language processing, numerous natural languages processing (NLP) tools for Korean, such as part-of-speech (POs) taggers, morphological analyzers , parsers, have been developed. This progress was possible through the availability of large-scale raw text corpora and POS tagged corpora (ETRI, 1999; Yoon and Choi, 1999a; Yoon and Choi, 1999b). However, no large-scale...

متن کامل

The Tibidabo Treebank El treebank Tibidabo

This paper describes work in progress for the creation of a new open– source resource for Spanish: an HPSG–based treebank so–called Tibidabo. The annotation is performed semi–automatically. First, the corpus is automatically annotated by a symbolic HPSG–based grammar for Spanish implemented on the Linguistic Knowledge Builder system; then, the output is manually disambiguated. The existence of ...

متن کامل

A Hidden Contributor to the Korean Miracle: The Korean Credit :union: Movement

Korean credit :::union:::s (CUs) are considered to be a hidden contributor to the “Korean miracle”, characterized by remarkable economic growth and relatively low income inequality. The Korean miracle not only generated wealth in an economically strapped and socially under-privileged people, but also contributed to regional community development and the democratization of Korean society. In...

متن کامل

A Treebank of Spanish and its Application to Parsing

This paper presents joint research between a Spanish team and an American one on the development and exploitation of a Spanish treebank. Such treebanks for other languages have proven valuable for the development of high-quality parsers and for a wide variety of language studies. However, when the project started, at the end of 1997, there was no syntactically annotated corpus for Spanish. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002